NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

SetBERT: the deep learning platform for contextualized embeddings and explainable predictions from high-throughput sequencing

https://doi.org/10.1093/bioinformatics/btaf370

Ludwig, II, David_W; Guptil, Christopher; Alexander, Nicholas_R; Zhalnina, Kateryna; Wipf, Edi_M_-L; Khasanova, Albina; Barber, Nicholas_A; Swingley, Wesley; Walker, Donald_M; Phillips, Joshua_L; et al (June 2025, Bioinformatics)

Abstract MotivationHigh-throughput sequencing (HTS) is a modern sequencing technology used to profile microbiomes by sequencing thousands of short genomic fragments from the microorganisms within a given sample. This technology presents a unique opportunity for artificial intelligence to comprehend the underlying functional relationships of microbial communities. However, due to the unstructured nature of HTS data, nearly all computational models are limited to processing DNA sequences individually. This limitation causes them to miss out on key interactions between microorganisms, significantly hindering our understanding of how these interactions influence the microbial communities as a whole. Furthermore, most computational methods rely on post-processing of samples which could inadvertently introduce unintentional protocol-specific bias. ResultsAddressing these concerns, we present SetBERT, a robust pre-training methodology for creating generalized deep learning models for processing HTS data to produce contextualized embeddings and be fine-tuned for downstream tasks with explainable predictions. By leveraging sequence interactions, we show that SetBERT significantly outperforms other models in taxonomic classification with genus-level classification accuracy of 95%. Furthermore, we demonstrate that SetBERT is able to accurately explain its predictions autonomously by confirming the biological-relevance of taxa identified by the model. Availability and implementationAll source code is available at https://github.com/DLii-Research/setbert. SetBERT may be used through the q2-deepdna QIIME 2 plugin whose source code is available at https://github.com/DLii-Research/q2-deepdna.
more » « less
Multiple constraints cause positive and negative feedbacks limiting grassland soil CO ₂ efflux under CO ₂ enrichment

https://doi.org/10.1073/pnas.2008284117

Fay, Philip A.; Hui, Dafeng; Jackson, Robert B.; Collins, Harold P.; Reichmann, Lara G.; Aspinwall, Michael J.; Jin, Virginia L.; Khasanova, Albina R.; Heckman, Robert W.; Polley, H. Wayne (January 2021, Proceedings of the National Academy of Sciences)
null (Ed.)
Terrestrial ecosystems are increasingly enriched with resources such as atmospheric CO 2 that limit ecosystem processes. The consequences for ecosystem carbon cycling depend on the feedbacks from other limiting resources and plant community change, which remain poorly understood for soil CO 2 efflux, J CO2 , a primary carbon flux from the biosphere to the atmosphere. We applied a unique CO 2 enrichment gradient (250 to 500 µL L −1 ) for eight years to grassland plant communities on soils from different landscape positions. We identified the trajectory of J CO2 responses and feedbacks from other resources, plant diversity [effective species richness, exp(H)], and community change (plant species turnover). We found linear increases in J CO2 on an alluvial sandy loam and a lowland clay soil, and an asymptotic increase on an upland silty clay soil. Structural equation modeling identified CO 2 as the dominant limitation on J CO2 on the clay soil. In contrast with theory predicting limitation from a single limiting factor, the linear J CO2 response on the sandy loam was reinforced by positive feedbacks from aboveground net primary productivity and exp(H), while the asymptotic J CO2 response on the silty clay arose from a net negative feedback among exp(H), species turnover, and soil water potential. These findings support a multiple resource limitation view of the effects of global change drivers on grassland ecosystem carbon cycling and highlight a crucial role for positive or negative feedbacks between limiting resources and plant community structure. Incorporating these feedbacks will improve models of terrestrial carbon sequestration and ecosystem services.
more » « less
Full Text Available
The Ecology Underground coalition: building a collaborative future of belowground ecology and ecologists

https://doi.org/10.1111/nph.17163

Defrenne, Camille E.; Abs, Elsa; Longhi Cordeiro, Amanda; Dietterich, Lee; Hough, Moira; Jones, Jennifer M.; Kivlin, Stephanie N.; Chen, Weile; Cusack, Daniela; Franco, André L. C.; et al (February 2021, New Phytologist)

Search for: All records